經過前面幾天的學習,我們已經分別掌握了日誌 (Loki)、追蹤 (Tempo) 和指標 (Mimir) 這三大可觀測性的支柱。今天,我們將迎來一個激動人心的里程碑:將這三者融會貫通,在 Grafana 中建立一個統一的儀表板,真正實現「單一窗格 (Single Pane of Glass)」的監控體驗。
在傳統的監控中,指標、日誌和追蹤通常是孤立的系統。當問題發生時,工程師需要在不同的系統和界面之間來回切換,試圖手動將線索拼湊起來。這個過程非常耗時且效率低下。
一個統一的儀表板可以讓我們:
首先,請在 day27 目錄下建立以下的檔案與資料夾結構。所有檔案的內容請參考下方的「完整設定檔內容」章節。
day27/
├── docker-compose.yml
├── loki-config.yaml
├── mimir-config.yaml
├── prometheus.yml
├── README.md
└── grafana-provisioning/
    ├── datasources/
    │   └── datasource.yml
    └── dashboards/
        ├── dashboard.yml
        └── main-dashboard.json
確認所有檔案都已建立並填入正確內容後,在 day27 的根目錄下,執行以下 Docker Compose 命令來啟動所有服務。
# -d 參數會讓服務在背景執行
docker-compose up -d
服務啟動後,Grafana 會自動根據設定檔完成以下兩件事:
開啟 Grafana: 在您的瀏覽器中訪問 http://localhost:3000。
找到儀表板: 點擊左側選單的 Dashboards 圖示,您應該能看到一個名為 Day 27: Unified Dashboard 的儀表板,直接點擊進入。
這個儀表板被分成了兩個部分:
Mimir: HTTP Requests Total 面板顯示了從 Mimir 查詢到的指標數據。這是我們監控系統健康狀況的「高層視圖」。Loki: Logs 面板顯示了從 Loki 查詢到的日誌數據。當我們在指標面板發現異常時,可以立刻查看同一時間範圍內的日誌。這就是整合儀表板的魔力所在。想像一個典型的問題排查場景:
trace_id(我們在 datasource.yml 中已設定好關聯),您在日誌面板中點擊它,就可以直接跳轉到 Tempo,查看導致這條錯誤日誌的完整分散式追蹤鏈路。以下是本次練習需要用到的所有設定檔的完整內容。
docker-compose.ymlversion: '3.8'
services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
    command: -config.file=/etc/loki/local-config.yaml
  tempo:
    image: grafana/tempo:2.2.0
    ports:
      - "3200:3200" # Tempo
      - "4317:4317" # OTLP gRPC
  mimir:
    image: grafana/mimir:2.9.0
    ports:
      - "9009:9009"
    volumes:
      - ./mimir-config.yaml:/etc/mimir.yaml
      - mimir-data:/data/mimir
    command: -config.file=/etc/mimir.yaml
  prometheus:
    image: prom/prometheus:v2.47.0
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command: --config.file=/etc/prometheus/prometheus.yml
  grafana:
    image: grafana/grafana:10.0.3
    ports:
      - "3000:3000"
    volumes:
      - ./grafana-provisioning/datasources:/etc/grafana/provisioning/datasources
      - ./grafana-provisioning/dashboards:/etc/grafana/provisioning/dashboards
volumes:
  mimir-data:
loki-config.yamlauth_enabled: false
server:
  http_listen_port: 3100
common:
  instance_addr: 127.0.0.1
  path_prefix: /tmp/loki
  storage:
    filesystem:
      chunks_directory: /tmp/loki/chunks
      rules_directory: /tmp/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory
schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h
mimir-config.yamltarget: all
auth_enabled: false
server:
  http_listen_port: 9009
  grpc_listen_port: 9095
distributor:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
ingester:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
    replication_factor: 1
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  max_transfer_retries: 0
ruler:
  alertmanager_url: http://localhost
  ring:
    kvstore:
      store: inmemory
blocks_storage:
  backend: filesystem
  filesystem:
    dir: /data/mimir/blocks
compactor:
  data_dir: /data/mimir/compactor
  sharding_ring:
    kvstore:
      store: inmemory
store_gateway:
  sharding_ring:
    kvstore:
      store: inmemory
prometheus.ymlglobal:
  scrape_interval: 15s
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'mimir'
    static_configs:
      - targets: ['mimir:9009']
remote_write:
  - url: "http://mimir:9009/api/v1/push"
grafana-provisioning/datasources/datasource.ymlapiVersion: 1
datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceUid: tempo
          matcherRegex: 'trace_id=(\w+)'
          name: TraceID
          url: '$${__value.raw}'
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true
  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    jsonData:
      tracesToLogs:
        datasourceUid: 'loki'
        tags: ['job', 'instance', 'pod', 'namespace']
        mappedTags: [{ key: 'service.name', value: 'job' }]
        spanStartTimeShift: '1s'
        spanEndTimeShift: '-1s'
grafana-provisioning/dashboards/dashboard.ymlapiVersion: 1
providers:
- name: 'default'
  orgId: 1
  folder: ''
  type: file
  disableDeletion: false
  editable: true
  options:
    path: /etc/grafana/provisioning/dashboards
grafana-provisioning/dashboards/main-dashboard.json{
  "__inputs": [],
  "__requires": [],
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": 1,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "title": "Mimir: HTTP Requests Total",
      "type": "timeseries",
      "datasource": {
        "type": "prometheus",
        "uid": "mimir"
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "mimir"
          },
          "expr": "rate(prometheus_http_requests_total[5m])",
          "legendFormat": "{{handler}}"
        }
      ]
    },
    {
      "title": "Loki: Logs",
      "type": "logs",
      "datasource": {
        "type": "loki",
        "uid": "loki"
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "targets": [
        {
          "datasource": {
            "type": "loki",
            "uid": "loki"
          },
          "expr": "{job=\"mimir\"}"
        }
      ]
    }
  ],
  "schemaVersion": 37,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Day 27: Unified Dashboard",
  "uid": "day27-unified",
  "version": 1,
  "weekStart": ""
}
今天,我們將可觀測性的三大支柱——指標、日誌和追蹤——整合到了一個統一的 Grafana 儀表板中。我們不僅學習了如何配置這樣一個儀表板,更重要的是理解了它在現代軟體系統監控和故障排查中的巨大價值。
透過這種方式,我們不再是看著孤立的數據點,而是在觀察一個完整的故事。從指標的宏觀趨勢,到日誌的具體細節,再到追蹤的完整上下文,我們擁有了前所未有的洞察力。
至此,我們已經完成了 Grafana 可觀測性技術棧 (Loki, Grafana, Mimir/Metrics, Tempo) 的核心學習路徑。恭喜您!